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6.5094: Deep Learning for Self-Driving Cars 
2018 





* Website: selfdrivingcars.mit.edu 
* Email: deepcars@mit.edu 
* Slack: deep-mit.slack.com 


* For registered MIT students: 
e Create an account оп the website. 


e Deeplraffic 2.0 neural network 
competition entry that achieves 65mph 
by 11:59pm, Fri, Jan 19 
* Competitions 
NOCT e DeeptTraffic (Deep RL in Browser) 


* SegFuse (Deep Learning in Video) 
* DeepCrash (Deep RL * Computer Vision) 


* Guest Speakers (see schedule) 
2018 Shirts (free in-person) 





5rjs.cn [OOOO 


| | | —- icons cue For the full updated list of references visit: MIT 6.5094: Deep Learning for Self-Driving Cars Lex Fridman January 
il Technology https://selfdrivingcars.mit.edu/references https://selfdrivingcars.mit.edu lex.mit.edu 2018 





DeepTraffic: Deep Reinforcement Learning 


DeepTraffic 


Main Page - Leaderboard - About DeepTraffic 
Americans spend 8 billion hours stuck in traffic every year. 





^ Deep neural networks can help! 
5 IlanesSide = 3; 
- 6 patchesAhead - 30; — 
- 7 patchesBehind - 18; 
" 8 trainIterations = 10000; 
= 


i 18 // the number of other autonomous vehicles controlled by your network 
| 11 otherAgents = 6; // max of 9 


в 12 


13 var num inputs = (lanesSide * 2 + 1) * (patchesAhead + patchesBehind); 
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Value Function Approximating Neural Network: 
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i . REQUEST VISUALIZATION 
Simulation Speed: 
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Fast v vehicle skins 
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SegFuse: Dynamic Driving Scene Segmentation 
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DeepCrash: Deep RL for High-Speed Crash Avoidance 





Learning Episode 200 ‘ : 
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DeepTesla: End-to-End Driving 











Learned Control 
(by Deep Neural Network) 


Steering Angle 
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Current Time (secs) 
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Lectures and Guest Talks 


Guest Talk Тие, Jan 16, 7pm Room 54-100 


Emilio Frazzoli 
CTO, nuTonomy 


Lecture Mon, Jan 5, /pm Room 54-100 
Deep Learning: Overview and Recent Advances 


[ Slides ]-[ Lecture Video ] (available Sean) 
Previously: Professor, MIT 


Lecture Тие, Јап 9, 7pm Room 54-100 
Self-Driving Cars: Overview and Recent Advances 


Lecture Wed, Jan 17, 7pm Room 54-100 
Deep Learning for Driver State Sensing 


[ Slides ] - [ Lecture Video ] (available Soon) [Slides] - [Lecture Video ] (available Soon) 


ma Lecture Меда, Јап 10, 7pm Room 54-100 
Í Deep RL for Driving Fast and Avoiding Crashes 


Guest Talk Thu, Jan 18, 7pm Room 54-100 


Oliver Cameron 
CEO, Voyage 
Previously: Head, Udacity Self-Driving Car Program 


| EI Slides ] - [ Lecture Video ] (Available Seon) 





Lecture Thu, Jan 11, 7pm Room 54-100 


Deep Learning for Driving Scene Understanding GuestTalk Fri, Jan 19, 7pm Room 54-100 


Sterling Anderson 


[ Slides] - [Lecture Video ] (Available Sean) 
(Co-Founder, Aurora 





4 a Previously: Director, Tesla Autopilot 


Guest Talk Fri, Jan 12, 1pm Room 32-123 + Notice: Different time and room! 


Sacha Arnoud 


Director of Engineering, Wayma 
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Why Self-Driving Cars? 


* Quite possibly, the first wide reaching and profound integration 
of personal robots in society. 


* Wide reaching: 1 billion cars on the road. 
e Profound: Human gives control of his/her life directly to robot. 


* Personal: One-on-one relationship of communication, collaboration, 
understanding and trust. 
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A self-driving car may be more a and less a perfect system. Why: 


* Flaws need humans: 
The scene understanding problem requires much more than pixel-level labeling 


e Exist with humans: 
Achieving both an enjoyable and safe driving experience may require "driving like a human". 
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Why Self-Driving Cars? mee Artificial 
„ылы. General 
* Opportunity to explore the m". Intell бепсе 
nature of intelligence and 


. " Кау Kurzweil Singularity 
the role of intellige nt Andrej Karpathy Deep Learning 
МЕТ 119118. Robotics 

syste MS | n society beca use Josh Tennenbaum (MI1 Computational Cognitive Science 

llya Sutskever Deep Reinforcement Learning 
full autonomy may require Lisa Feldman Barrett Emotion Creation 

Nate Derbinsky Cognitive Modeling 

human-level artificial Lex Fridman Artificial General Intelligence 


intelligence. 


See also our class exploring 
human-level artificial intelligence: 
MIT 6.S099 Artificial General Intelligence 


https://agi.mit.edu 





#— d 
-— —Ó e - 





risen nnHHBf 
| ш шш Massachusetts For the full updated list of references visit: c 
( Institute of MM И "MM I" 
Technology https://selfdrivingcars.mit.edu/references 


























Human-Centered Artificial Intelligence Approach 


90 % Needed 10 % 
Solve the perception-control And where 
problem where possible: involve the human 
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Why Deep Learning? 


Deep Learning: Deep Learning: 
Learn effective perception-control from data Learn effective human-robot interaction from data 


Solve the perception-control And where 
problem where possible: involve the human 
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Deep Learning is Representation Learning 


(aka Feature Learning) 


Output 
(object identity) 


3rd hidden layer 
(object parts) 


2nd hidden layer 
(corners and 
contours) 











Machine 
Learning 


Ist hidden layer 
(edges) 






Artificial 
Intelligence 


Visible layer 
(input pixels) 





Intelligence: Ability to accomplish complex goals. 
Understanding: Ability to turn complex information to into simple, useful information. 
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Representation Matters 





Heliocentrism Geocentrism 


Sun-Centered Model Earth-Centered Model 


(Formalized by Copernicus in 16t" century) 
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Representation Matters 


Cartesian coordinates Polar coordinates 





Task: Draw a line to separate the green triangles and blue circles. 
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Representation Matters 
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Task: Draw a line to separate the blue curve and red curve 
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Deep Learning is Representation Learning 


(aka Feature Learning) 
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Task: Draw a line to separate the blue curve and red curve 
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Performance 


BEN Massachusetts 


Institute of 
Technology 


Deep Learning: Scalable Machine Learning 








Amount of Data 


For the full updated list of references visit: 


https://selfdrivingcars.mit.edu/references 
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Algorithms 
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* Deep learning approaches improve with more data. 


* Artificial intelligence system in the real-world are all about 
generalizing over the edge cases 
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Biological * Thalamocortical brain network (simulation video shown below) 


e 3million neurons, 476 million synapses 


Neural 
Network 


• Full human brain: 


e 100 billion neurons, 1,000 trillion synapses 





IMRI DOLD (Coronal View) "M 
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Artificial * Human neural network: 100 billion neurons, 1,000 trillion synapses 
Neural * ResNet-152 neural network: 60 million synapses 


Network 
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Neuron: Biological Inspiration for Computation 


impulses carried 
toward cell body 
branches 
of axon 










f "7 BAON 


terminals 





impulses carried 


| away from cell body 
cell body 


* Neuron: computational building 
block for the brain 


20 Wo 


synapse 
axon from a neuron 
шоо 
dendrite 


cell body 


X wizi +b 







f (Som $ ) 


output axon 


activation 
function 


U121 






* (Artificial) Neuron: computational 
building block for the "neural network" 
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Differences (among others): 

* Parameters: Human brains have ~10,000,000 
times synapses than artificial neural networks. 

* Topology: Human brains have no "layers". 
Topology is complicated. 

* Async: The human brain works 
asynchronously, ANNs work synchronously. 

* Learning algorithm: ANNs use gradient 
descent for learning. Human brains use ... (we 
don't know) 

* Processing speed: Single biological neurons 
are slow, while standard neurons in ANNs are 
fast. 

* Power consumption: Biological neural 
networks use very little power compared to 
artificial networks 

* Stages: Biological networks usually don't stop 
/ start learning. ANNs have different fitting 
(train) and prediction (evaluate) phases. 


Similarity (among others): 
e Distributed computation on a large scale. 


ПОООЦО 
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Neuron: Forward Pass 


0.7 


0.6 sum bias 





1.4 


Start 


1. weigh 2.sum up 3. activate 
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Combining Neurons into Layers 





Feed Forward Neural Network Recurrent Neural Network 


- Have state memory 
- Are hard to train 
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Combing Neurons in Hidden Layers: 
The "Emergent" Power to Approximate 


Ly 


| D] | 
! ‚ f! (24, Zə, £3) 
T2 output {жы МУ 
- f^(24, 29, 24) 
L3 


Universality: For any arbitrary function f(x), there exists a neural 
network that closely approximate it for any input x 


Universality is an incredible property!* And it holds for just 1 hidden layer. 
* Given that we have good algorithms for training these networks. 
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Deep Learning from Human and Machine 
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Learning 
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Deep Learning from Human and Machine 




















“Teachers” “Students” 
Supervised 
T Learning 
Current successes 
— Augmented 
Supervised 
— Learning 
—| _ semi 
Supervised 
— Learning 








Near-term future successes 


Human > Reinforcement 
| Learning 


= — Long-term future successes 
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Special Purpose Intelligence: 


Estimating Apartment Cost 


Rooms: tbr "v. 


$3235+ 1 $1985+ 
$2835+ 
$2670+ 
$2515+ 
$2380+ 
$2270+ 
$2175+ 
$2100+ 
$2045+ 


Input n=11478 


| < |2017-12-18| > | 


main site details 


Bedrooms 
dh Winthro 
Sq. Feet 
ternational 
Airport 
Neighborhood 
(mapped to 


an id number) 
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(Toward) General Purpose Intelligence: 


Pong to Pixels 


Policy Network: 


raw pixels hidden layer 


7 









probability of 
moving UP 


e 80x80 image (difference image) 
e 2 actions: up or down 
* 200,000 Pong games 





Andrej Karpathy. "Deep Reinforcement 
Learning: Pong from Pixels." 2016. 
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Deep Learning: Training and Testing 


Training Stage: 





Learning Correct 
System Output 


(aka "Ground Truth") 


Testing Stage: 


New Input кани | Best Guess 
Data system | 
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How Neural Networks Learn: Backpropagation 


Forward Pass: 
pu Neural u 
Backward Pass (aka Backpropagation): 


Neural Measure 
Network of Error 


Adjust to Reduce Error 
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What can we do with Deep Learning? 





Learning Correct 
System Output 
e Number e Number 
* Vector of numbers * Vector of numbers 
e Sequence of numbers e Sequence of numbers 
e Sequence of vectors of numbers e Sequence of vectors of numbers 
one to one one to many many to one many to many many to many 


S Б ШШ 
мра 
— 


5rjs.cn {УШ ШЙ 


| | | == берар аы МІТ 6.5094: Deep Learning for Self-Driving Cars Lex Fridman January 
| il Technology https://selfdrivingcars.mit.edu lex.mit.edu 2018 





e Backfed Input Cell 


e Input Cell 


^ Noisy Input Cell 


e Hidden Cell 


Ө Probabilistic Hidden сец 
A Spiking Hidden Cell 


Q Output Cell 


o Match Input Output Cell 


© Recurrent Cell 
© Memory Cell 


^ Different Memory Cell 


o Kernel 


o Convolution or Pool 


Markov Chain (MC) 





Generative Adversarial Network (GAN) 





Deep Residual Network (DRN) 


Hopfield Network (HN) Boltzmann Machine (BM) Restricted BM (RBM) 


Useful Deep Learning Terms 


A mostly complete chart of 


Neural Networks 


©2016 Fjodor van Veen - asimovinstitute.org 


Deep Feed Forward (DFF) 


Perceptron (P) Feed Forward (FF) Radial Basis Network (RBF) 


oe 609 


Long / Short Term Memory (LSTM) Gated Recurrent Unit (GRU) 
C) C) C) C y 





Recurrent Neural Network (RNN) 
С) C) 
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eer 










У У /\) 





Auto Encoder (АЕ) Variational AE (VAE) Denoising AE (DAE) 


Sparse AE (SAE) 


Sd Sd 
ЧУ SU 
AS (Qm 


S wr. 
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Deep Belief Network (DBN) 


„= к Y ФУ NG, 
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Deconvolutional Network (DN) Deep Convolutional Inverse Graphics Network (DCIGN) 





Liquid State Machine (LSM) Extreme Learning Machine (ELM) Echo State Network (ESN) 





Kohonen Network (KN) Support Vector Machine (SVM) ^ Neural Turing Machine (NTM) 





si Big ii de 


BEN Massachusetts 
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Basic terms: 
Deep Learning « Neural Networks 


Deep Learning is a subset of Machine Learning 


Terms for neural networks: 


MLP: Multilayer Perceptron 
DNN: Deep neural networks 


RNN: Recurrent neural networks 
e LSTM: Long Short-Term Memory 


CNN: Convolutional neural networks 


DBN: Deep Belief Networks 


Neural network operations: 


Convolution 
Pooling 

Activation function 
Backpropagation 


ПОООЦО 
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Key Concepts: 


Activation Functions 


Sigmoid Activation Function 


Y Axis 





100 75 -50 -25 00 25 5.0 75 10.0 
X Axis 


Tanh Activation Function 
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0.75 
0.50 
0.25 
0.00 


Y Axis 


Y Axis 
e e e e e н 
о юы 4 e со © 


=0.25 
—0.50 
—-0.75 
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ReLU Activation Function 


Y Axis 


max(0,x) 


-10.0 -7.5 -50 -25 0.0 25 5.0 7.5 10.0 
Х Ахіѕ 
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Y Axis 





Y Axis 


[148] 


Derivative of Sigmoid Activation Function 


gix) = о(х)(1 — а(х) 


-100 -75 -50 -25 00 25 5.0 75 10.0 
X Axis 


Derivative of Tanh Activation Function 


-10.0 -7.5 -5.0 -2.5 0.0 25 5.0 7.5 10.0 
X Axis 


Derivative of ReLU Activation Function 


-10.0 -7.5 -5.0 -25 0.0 25 5.0 75 10.0 
X Axis 
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Sigmoid 


e Vanishing gradients 


Not zero centered 


Tanh 


e Vanishing gradients 


ReLU 


e Not zero centered 
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Key Concepts: 
Backpropagation 


д U) pft41) 

109/00) = ау 0; (error term of the output layer) 
ij 

compute gradient) 60 = q@- y 


ws x 20) ZO 
С) / di y = target y 

“\ 

С) ж 

m 


Input x C) V CO output y Input x C) 
P / X) O O 
J~ ы” 8 кый 00 ат 


(error term of the hidden layer) 


Task: Update the weights and biases to decrease loss function 


Loss function: 


Subtasks: 


1. Forward pass to compute network output and “error” C (y — a) 


2. Backward pass to compute gradients 2 


3. A fraction of the weight's gradient is subtracted from the weight. 


| 


Learning Rate 
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Learning is an Optimization Problem 


Task: Update the weights and biases to decrease loss function 


0 0 
Weight 1 Weight 1 


Use mini-batch or stochastic gradient descent. 
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Optimization is Hard: Vanishing Gradients 


sigmoid function derivative of sigmoid 


derivative is:zero at tails 





-10 "- 0 5 10 


da(x) 


"= = (1 — e(z))e(z) 


Partial derivatives are small = Learning is slow 
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Optimization is Hard: Dying ReLUs 


ReLU function derivative of ReLU 


derivative exadtly zero here 





* If a neuron is initialized poorly, it might not fire for 
entire training dataset. 


* Large parts of your network could be dead ReLUs! 
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Optimization is Hard: Saddle Point 





SGD 
Momentum 
NAG 
Adagrad 
Adadelta 






SGD 
Momentum 
NAG 
Adagrad 
Adadelta 













ш Rmsprop Rmsprop 
2 
0 = 
Э Em SESS ХУ 
1.0 
—1.5 
Vanilla SGD gets your there, 
Hard to break symmetry , 
but is slow sometimes. 
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Key Concepts: 
Overfitting and Regularization 


* Help the network generalize to data it hasn't seen. 
* Big problem for small datasets. 


* Overfitting example (a sine curve vs 9-degree polynomial): 
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Key Concepts: 
Overfitting and Regularization 


* Overfitting: The error decreases in the training set but 
increases in the test set. 


|—96— Training 
— OÓ— Test 





0 3 
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Key Concepts: 
Regularization: Early Stoppage 


| Original Set | 


Training Testing 


Training Validation Testing 





* Create "validation" set (subset of the training set). 
* Validation set is assumed to be a representative of the testing set. 


* Early stoppage: Stop training (or at least save a checkpoint) 
when performance on the validation set decreases 
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Key Concepts: 
Regularization: Dropout 


pz0.5 


hidden fc layer dropout layer 


input layer output layer 





Training time 


* Dropout: Randomly remove some nodes in the network (along 
with incoming and outgoing edges) 
* Notes: 
* Usually р >= 0.5 (pis probability of keeping node) 


* Input layers p should be much higher (and use noise instead of dropout) 
* Most deep learning frameworks come with a dropout layer 
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Key Concepts: 
Regularization: Weight Penalty (aka Weight Decay) 


* [2 Penalty: Penalize squared weights. Result: 
* Keeps weight small unless error derivative is 


W () very large. 
* Prevent from fitting sampling error. 
С e Smoother model (output changes slower as 


the input change). 


* If network has two similar inputs, it prefers to 
put half the weight on each rather than all the 
weight on one. 


W/2 W/2 
* 11 Penalty: Penalize absolute weights. Result: 


* Allow for a few weights to remain large. 
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Neural Network Playground 


http://playground.tensorflow.org 


Epoch Learning rate Activation Regularization 


000,000 0.03 ” Tanh v None 


4 





РАТА FEATURES + — 2 HIDDEN LAYERS 


Which dataset do Which properties 
you want to use? do you want to 
feed in? a) cm > = 
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Regularization rate Problem type 


о 
. 


Classification Y 


OUTPUT 
Test loss 0.489 


Training loss 0.498 
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Hover to see 0 
REGENERATE 
Colors shows — 
data, neuron and ! d | 
weight values Б | 
[Г] Show testdata — [ ] Discretize output 
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Deep Learning Breakthroughs: What Changed? 


Microprocessor Transistor Counts 1971-2011 & Moore's Law 


== Moos SPARC T3 9 Co m р u te 
2.600,000,000 Six-Core Xeon 7400 „Фф €10-Core Xeon Westmere-EX 
Dual-Core ttanium2@ ® $ L-B-core POWERT 
1,000,000,000 роко Е Core Han Tunta CPUs А GPUs ; ASICs 
mence DU On И : 
100,000,000 fon T Organized large(-ish) datasets 
Pentium 4€ — 9 Alom 
curve shows transistor OAND KI | m a ge n et 
E 10,000,000 ou gi өғеныт n 
Ө S. AMD K5 e 
5 * Algorithms and research: 
804860 
B 5 Back CNN, LSTM 
E » ackprop, CNN, 
100,000 — 
o0e 


е oo • Software and Infrastructure 
10,000} SE вын Git, ROS, PR2, AWS, Amazon 


23007 We ncn Mechanical Turk, TensorFlow, ... 
1971 1980 1990 2000 2011 


* Financial backing of large companies 
Google, Facebook, Amazon, ... 





CPU GPU 
MULTIPLE CORES THOUSANDS OF CORES 
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Deep Learning: 

Our intuition about what's "hard" is flawed (in complicated ways) 
Visual perception: 540,000,000 years of data 
Bipedal movement: 230,000,000 years of data 
Abstract thought: 100,000 years of data 





Prediction: Dog + Distortion Prediction: Ostrich 


"Encoded in the large, highly evolve sensory and motor portions of the human brain is a billion years of experience about the nature of 
the world and how to survive in it.... Abstract thought, though, is a new trick, perhaps less than 100 thousand years old. We have not yet 
mastered it. It is not all that intrinsically difficult; it just seems so when we do it." 


- Hans Moravec, Mind Children (1988) | 
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Deep Learning is Hard: 


Illumination Variability 
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Deep Learning is Hard: 


Pose Variability and Occlusions 





Figure 1. The deformable and truncated cat. Cats exhibit (al- 


Parkhi et al. "The truth about cats and dogs.” 2011. 
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Deep Learning is Hard: 


Intra-Class Variability 













Bombay 





Persian 


Ragdoll 


Keeshond 


a8 


DXX 





s 


є Р С 
А Q A ЖУ л. 

Е кы? 
DEC - = 


4 Great Pyrenees P E German Shorthaired 





Chihuahua 


Parkhi et al. "Cats and dogs." 2012. 
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Object Recognition / Classification 


C; S, C, S. п пу 
input feature maps feature maps feature maps feature maps output 
32 x 32 28 x 28 14 x 14 10 x 10 5x5 


convolution i 2x2 N ` © ES fully \ 





convolution v subsampling 


subsampling -—— N Conn connected. 
feature extraction classification 


bumper car snow leopard 
golfcart Egyptian cat 
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What is ImageNet? 


* ImageNet: dataset of 14+ million images (21,841 categories) 


e Let's take the high level category of fruit as an example: 
* Total 188,000 images of fruit 
* There are 1206 Granny Smith apples: 
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What is ImageNet? 





Dataset * ImageNet: dataset of 14+ million images 


Competition ———— • ILSVRC: ImageNet Large Scale Visual Recognition 
Challenge 


Networks ——————> • AlexNet (2012) 
e ZFNet (2013) 
e VGGNet (2014) 
* GoogLeNet (2014) 
* ResNet (2015) 
e CUImage (2016) 
e Squeeze-and-Excitation Networks (2017) 
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ILSVRC Challenge Evaluation for Classification 


* lop5error rate: 


* You get 5 guesses to get the correct label 


Image classification 


Steel drum 


И: 





Ground truth 


e "2096 reduction in accuracy for Тор 1 vs Тор 5 


M 
Steel drum 
Folding chair 
Loudspeaker 





Accuracy: 1 





Scale 
T-shirt 
Steel drum 
Drumstick 
Mud turtle 


Accuracy: 1 








Scale 
T-shirt 
Giant panda 
Drumstick 
Mud turtle 


Accuracy: O 


e Human annotation is a binary task: "apple" or "not apple" 


[||| 557" References: [123] 
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e  AlexNet (2012): First CNN (15.4%) 





0.3 e 8 layers 
e 61 million parameters 
oos BEA xe 
ч i e ZFNet (2013): 15.4% to 11.2% 
E 0.2 e 8 layers 
s e Моге filters. Denser stride. 
t Е | *  VGGNet (2014): 11.2% to 7.3% 
e . | | | 
2 а 16.7% Vj 23.3% ү че а 1, 2х2 max pool 
о 0.05 e 16 layers 
0 а e 138 million parameters 





2010 2011 2012 2013 2014 2015 2016 2017 e GoogLeNet (2014): 11.2% to 6.796 


e Inception modules 
e 22layers 


* 5 million parameters 
(throw away fully connected layers) 


e Human error: 5.196 *  ResNet (2015): 6.7% to 3.57% 


e More layers = better performance 


e Surpassed in 2015 + 152 layers 


e  CUImage (2016): 3.57% to 2.9996 


* 2018: ImageNet Challenge н didit ned 
moves to Kaggle * SENet (2017): 2.99% to 2.251% 


e Squeeze and excitation block: network 
is allowed to adaptively adjust the 
weighting of each feature map in the 
convolutional block. 
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Same Architecture, Many Applications 








D Ту о т то тот чт o і 
{ | 
| | і 
| 4 ds Pirg | 
i i | 
| 
| Psunset | 
i | 
PF ч ! 
d 
| Ms i 
convolution + max pooling | vec | | 
nonlinearity | | 
convolution + pooling layers | fully connected layers Мх binary classification і 
{ | 


This part might look different for: 

* Different image classification domains 

* Image captioning with recurrent neural networks 

* Image object localization with bounding box 

* Image segmentation with fully convolutional networks 


* Image segmentation with deconvolution layers 
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Pixel-Level Full Scene 
Segmentation 


"tabby cat" 
aeo «cR. 
250 ap^ a0" 75059 S AQS | 


convolutionalization 





tabby cat heatmap 







©9000 
рет S NS AQS 
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Massachusetts 
Institute of 
Technology 


Colorization of Images 







Lightness L Color ab Lab Image 


convi conv2  conv3 conv4 conv5 conv6 conv7 conv8 
å trous / dilated a trous / dilated 





64 











28 256 512 512 512 512 258 
| эин шнын шынын шынын шашын. І 
ў 64 32 32 32 32 32 X 64 
/ 128 
(a,b) probability 
distribution 
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Object Detection 





^. ON LE 7 
5 1 ! + E ee 
х ДРА “еа 
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1. Input 2. Extract region 3. Compute 4. Classify 
image proposals (~2k) CNN features regions 
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Background Removal (2017) 





П Dense Block Ш Convolution 
Щ Transition Down transition Up 


---» Skip Connection Concatenation 
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pix2pixHD: generate high-resolution photo-realistic images 
from semantic label maps (2017) 
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Flavors of Neural Networks 


one to one one to many many to one many to many many to many 


_ 


ag — 
— 

— 

2 а ERN 


П Recurrent Neural Networks 
Vanilla 


Neural 
Networks 
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Handwriting Generation from Text 


Text --- up to 100 characters, lower case letters work best 
Deep Learning for Self Driving Cars| 





Input: 


Output: 


Hidden Layers 
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Applications: Image Caption Generation 





woman, crowd, cat, 
camera, holding, purple 





A purple camera with a woman. 
A woman holding a camera in a crowd. 





A woman holding a cat. 


TIME X pu Ый 
c—— тїй! SEES 
a man sitting on a couch with a dog #1 A woman holding a 
4 man sittina on a chair with a doa in his lap camera in a crowd. 
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Video Description Generation 


Correct descriptions. Relevant but incorrect 
— 







S2VT: A man is cutting a piece of a pair of a paper. 


te Tæ Ñ 
F ү” - 


«pad» «pad» «pad» «pad» «pad» 


Venugopalan et al. 





a ah hs ha ln NP NE "Sequence to sequence-video to text." 2015. 
<pad> 
man М ois коз Code: https://vsubhashini.github.io/s2vt.html 
Encoding stage Decoding stage time 
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Modeling Attention Steering 








Jimmy Ba, Volodymyr Mnih, and Koray 
Kavukcuoglu. "Multiple object recognition 
with visual attention." (2014). 
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Drawing with Selective Attention 


Reading Writing 





Gregor et al. "DRAW: A recurrent neural network for image generation." (2015). Code: https://github.com/ericjang/draw 
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(Toward) General Purpose Intelligence: 


Pong to Pixels 


Policy Network: 


raw pixels hidden layer 






probability of 
moving UP 





e 80x80 image (difference image) 
e 2 actions: up or down 
* 200,000 Pong games 
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Human expert 





positions 


e—UuGOWl--— 


ш шш Massachusetts 
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Technology 


AlphaGo (2016) Beat Top Human at Go 


Supervised Learning Reinforcement Learning Self-play data Value network 


policy network policy network 








Computer Programs Calibration Human Players 
! 


DeepMind challenge match | Lee Sedol (9p) 
AlphaGo (Mar 2016) Top player of 


asl З) past decade 


Nature match | Fan Hui (2p) 
AlphaGo (Oct 2015) ав 3-times reigning 
fj | Euro Champion 


Amateur 
humans 


Crazy Stone and Zen 
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AlphaGo Zero (2017): Beats AlphaGo 


Elo Rating 











0 о 10 15 20 25 30 35 40 


s» AlphaGo Zero 40 blocks eee: AlphaGo Lee sese AlphaGo Master 
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DeepStack first to beat professional poker players (2017) 
(in heads-up poker) 


(INVERSE) 
BUCKETING BUCKETING 
r— 3 r— 3 CARD 
CARD FEEDFORWARD ZERO-SUM COUNTERFACTUAL 
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a Input 7 Hidden Layers Output Zero-sum Output 
a Bucket * fully connected Bucket Error Counterfactual 
ranges * linear, PReLU values values 
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Current Drawbacks 


Defining a good reward function is difficult... Coast Runners: Discovers local pockets of 
high reward ignoring the "implied" bigger picture goal of finishing the race. 





In addition, specifying a reward function for self-driving cars raises ethical questions... 
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Robustness: 


299.696 Confidence in the Wrong Answer 
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Robustness: 


Fooled by a Little Distortion 


E 





M 
correct +distort ostrich correct +distort ostrich 
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Current Challenges 


Transfer learning: Unable to transfer representation to most reasonably 
related domains except in specialized formulations. 


* Understanding: Lacks "reasoning" or ability to truly derive "understanding" as 


previously defined on anything but specialized problem formulations. 
(Definition used: Ability to turn complex information to into simple, useful 
information.) 


Requires big data: inefficient at learning from data 
Requires supervised data: costly to annotate real-world data 


Not fully automated: Needs hyperparameter tuning for training: learning 
rate, loss function, mini-batch size, training iterations, momentum, 
optimizer selection, etc. 


Reward: Defining a good reward function is difficult. 


Transparency: Neural networks are for the most part black boxes (for real- 
world applications) even with tools that visualize various aspects of their 
operation. 


Edge cases: Deep learning is not good at dealing with edge cases. 
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Why Deep Learning? 


Deep Learning: Deep Learning: 
Learn effective perception-control from data Learn effective human-robot interaction from data 


Solve the perception-control And where 
problem where possible: involve the human 
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Thank You 


Anva Google Autoliv 





auto CSRC 


а Mazon а lexa “о Collaborative Safety Research Center 


TOYOTA 
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Thank You 








| Country Sessions % Sessions 
1. BE United States 98,928 B 32.89% 
2. dm India 29352 В 976% 
3. ME China 20407 8 6.78% 
4. Germany 15718 [| 5.23% 
5. i€ South Korea 10493 | 3.49% 
6. [el Canada 8728 | 2.90% 
7. ЕБ United Kingdom 8717 | 2.90% 
8. @ Japan 7543 | 2.51% 
9. шш Russia 6594 | 2.19% 
10. EE Taiwan 6,353 | 2.11% 


Speed (mph) 


1.5x10* 
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